Linking Users Across Domains with Location Data: Theory and Validation
نویسندگان
چکیده
Linking accounts of the same user across datasets – even when personally identifying information is removed or unavailable – is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored. In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse. Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW 2016, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4143-1/16/04. http://dx.doi.org/10.1145/2872427.2883002 .
منابع مشابه
A New Ontology-Based Approach for Human Activity Recognition from GPS Data
Mobile technologies have deployed a variety of Internet–based services via location based services. The adoption of these services by users has led to mammoth amounts of trajectory data. To use these services effectively, analysis of these kinds of data across different application domains is required in order to identify the activities that users might need to do in different places. Researche...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملTeacher training students’ experiences of the role of Coach in linking theory and practice in practicum
Future teachers of education are Current students that focus on the development of teacher knowledge in the field of theory and practice in practicum as an important part of the teacher training program. One of the effective factors in the pre-service programs is schools coach, The study seeks it. The reserch method used in this study was qualitative method and was of phenomenological type. An...
متن کاملConformity of Hospital Information Systems to ISO Standard 9241/ 110 in Hospitals Affiliated to Bushehr University of Medical Sciences: the Users, Point of View
Background: Extensive use of hospital information systems mandate their assessment. Materials and Methods: This cross-sectional study was conducted in hospitals affiliated to Bushehr University of Medical Sciences from May 2018 to February 2019. Data were collected using ISO 9241/110 standard questionnaire. This self-administered questionnaire was distributed among 568 software users in the u...
متن کاملA Review of Spatial Factor Modeling Techniques in Recommending Point of Interest Using Location-based Social Network Information
The rapid growth of mobile phone technology and its combination with various technologies like GPS has added location context to social networks and has led to the formation of location-based social networks. In social networking sites, recommender systems are used to recommend points of interest (POIs) to users. Traditional recommender systems, such as film and book recommendations, have a lon...
متن کامل